AITopics | layer normalization

Collaborating Authors

layer normalization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Using Fast Weights to Attend to the Recent Past

Jimmy Ba, Geoffrey E. Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Neural Information Processing SystemsMay-1-2026, 06:05:55 GMT

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These "fast weights" can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.

artificial intelligence, fast weight, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

integration

Neural Information Processing SystemsApr-25-2026, 12:22:40 GMT

Current operator library with quantized operators is not feasible for vision transformer inference because of the specific operators including the GeLU activation and layer normalization. Layer normalization (LayerNorm) normalizes the activations of each layer in a neural network independently, reducing internal covariate shift and improving training stability as follows: LayerNorm(x) = γ p Var(x)+ϵ (x µ)+β, (1) where x is the input tensor. We construct surrogate equations with fixed-point interactive methods to calculate the output of the square root operators inspired by I-BERT[3]. We provide the details of how to approximate the square root operators in Algorithm.1. GeLU requires the cumulative distribution function (CDF) of Gaussian distribution, we approximate the activation function by Equation.2[1].

artificial intelligence, machine learning, search space, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization

Neural Information Processing SystemsMar-22-2026, 20:43:59 GMT

This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and scalability.However, Transformer architectures, which tokenize input data (via patchification), face a trade-off between visual fidelity and computational complexity due to the quadratic nature of self-attention operations concerning token length.While larger patch sizes enable attention computation efficiency, they struggle to capture fine-grained visual details, leading to image distortions.To address this challenge, we propose augmenting the **Di**ffusion model with the **M**ulti-**R**esolution network (DiMR), a framework that refines features across multiple resolutions, progressively enhancing detail from low to high resolution.Additionally, we introduce Time-Dependent Layer Normalization (TD-LN), a parameter-efficient approach that incorporates time-dependent parameters into layer normalization to inject time information and achieve superior performance.Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, where DiMR-XL variants surpass previous diffusion models, achieving FID scores of 1.70 on ImageNet $256 \times 256$ and 2.89 on ImageNet $512 \times 512$. Our best variant, DiMR-G, further establishes a state-of-the-art 1.63 FID on ImageNet $256 \times 256$.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Transformers on Markov data: Constant depth suffices

Neural Information Processing SystemsFeb-18-2026, 18:25:42 GMT

Attention-based transformers have been remarkably successful at modeling generative processes across various domains and modalities.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Communications (0.68)

Add feedback

Normalization and effective learning rates in reinforcement learning Clare Lyle

Neural Information Processing SystemsFeb-17-2026, 22:21:43 GMT

Layer normalization has demonstrated remarkable effectiveness at preventing plasticity loss in continual and reinforcement learning (RL), though the precise reasons for this effectiveness remain mysterious.

machine learning, normalization, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

8ba80c47b9d3dced79ee835b7d3bf72a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 18:33:22 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Medford (0.05)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Root Mean Square Layer Normalization

Biao Zhang, Rico Sennrich

Neural Information Processing SystemsFeb-11-2026, 15:57:47 GMT

Neural Information Processing Systems http://nips.cc/

layernorm, normalization, rmsnorm, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Theoretical

Neural Information Processing SystemsFeb-11-2026, 08:26:04 GMT

The question of if and how rank collapse affects training is still largelyunanswered, anditsinvestigation isnecessary foramore comprehensive understanding ofthisarchitecture.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: